Training Software Engineering Agents and Verifiers with SWE-Gym
We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents.
SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language.
We use SWE-Gym to train language model based SWE agents , achieving up to 19% absolute gains in resolve rate on the popular SWE-Bench Verified and Lite test sets.
Figure 2: リポジトリの内訳
リポジトリごとに単体テストを動かすのが難しい
単体テストの結果を予測するreward model -> inference time scaling (Figure 1 bottom)